Handling OS Signals

Learn to handle OS signals in CLI applications using the Context package.

When writing CLI applications, there are occasions when a developer wants to handle OS signals. The most common example is a user trying to exit a program, usually through a keyboard shortcut.

In these cases, we may want to do some file cleanup before exiting or cancel a call we made to a remote system.

In this lesson, we’ll talk about how we can capture and respond to these events to make our applications more robust.

svg viewer

Capturing an OS signal#

Go deals with two types of OS signals:

  • Synchronous

  • Asynchronous

Synchronous signals generally revolve around program errors. Go treats these as runtime panics, and therefore, interception of these can be handled with a defer statement. There are different asynchronous signals, depending on the platform, but for a Go programmer, the most relevant are as follows:

  • SIGHUP: The connected terminal disconnected.

  • SIGTERM: Please quit and do cleanup (generated from a program).

  • SIGINT: The same as SIGTERM (sent from the terminal).

  • SIGQUIT: The same as SIGTERM plus a core dump (sent from the terminal).

  • SIGKILL: The program must quit; this signal cannot be captured.

In situations where these arise, it can be useful to intercept these signals so that we can cancel ongoing operations and do a cleanup before exiting. It should be noted that SIGKILL cannot be intercepted, and SIGHUP is simply an indication that a process has lost its terminal, not necessarily that it was canceled. This could be because it was moved to the background or another similar event.

To capture a signal, we can use the os/signal package. This package allows a program to receive notifications of a signal from an OS and respond. Here is a simple example:

/
main.go
Capturing a signal with the os/signal package

After running the code above, enter “Ctrl + C” in the terminal to send a signal. Make sure to do it before the execution ends.

This code does the following:

  • Line 24: Creates a channel, signals, on which to receive signals.

  • Lines 27–29: Subscribes to signals of the SIGINT, SIGTERM, and SIGQUIT types.

  • Lines 33–38: Uses a goroutine to handle incoming signals, which does the following:

    • Calls the cleanup() function to handle program cleanup.

    • Exits with the 1 code on SIGINT and SIGTERM.

    • Panics, which gives a basic core dump on SIGQUIT.

Signal-handling code should be done in our main package. The cleanup() function should contain function calls that handle outstanding items, such as remote call cancellations and file cleanup.

Note: We can control the amount of data and generation method of a core dump using an environmental variable, GOTRACEBACK. We can read about it here.

Using Context to cancel#

The key method in Go to cause operations to stop processing is to use the context cancellation feature of Go's context.Context object.

By simply creating a Context object with cancellation in main() and passing it to all function calls, we can effectively cancel all ongoing work. This can be handy when we want to stop processing and cleanup because a user hits Ctrl + C.

We are going to show an advanced signal handling method on a program that does the following:

  • Creates a new temporary file every 1 second for 30 seconds

  • Cleans up files if the program is canceled

Let's start by creating a function to handle our signals:

Function to handle signals

This code does the following:

  • Line 1: Creates a new function called handleSignal() which has an argument called cancel, which is used to signal a function chain to stop processing.

  • Line 2: Creates an out channel that we use to return the signal received.

  • Line 3: Creates a notify channel to receive signal notifications.

  • Line 11–25: Creates a goroutine to receive signals:

    • If the signal is for exiting, call cancel()

    • Return the signal that told us to exit.

    • If it is some other signal, just log it.

Now, let's create a function that creates our files:

The code to create files

This code does the following:

  • Lines 2–11: Loops 30 times, which does the following:

    • Checks whether our ctx is canceled.

    • If so, returns the error.

    • Otherwise, creates a file in tmpFiles.

    • Sleeps for 1 second between file creations.

This code will create files in tmpFiles named from 0 to 29 unless there is a problem writing the file or Context is canceled. Now, we need some code to clean up the files if we receive a quit signal. If we don't, the files are left alone:

The function to cleanup files

This code does the following:

  • Lines 2–5: Uses os.RemoveAll() to remove the files. Also removes the temporary directory.

  • Line 6: Notifies the user that cleanup was done.

Let's tie it all together with our main():

/
main.go
Execution of the codes

This code does the following:

  • Line 17: Creates a temporary file directory.

  • Line 24: Creates a root Context object, ctx:

    • ctx can be canceled with cancel().

  • Line 50: Calls our handleSignal() to handle any signal to quit.

  • Lines 28–42: Executes our createFiles() function:

    • If we have an error, we call cleanup().

    • After cleanup, we see whether we received a signal as opposed to just an error.

    • If it is a signal and it is SIGQUIT, we call panic(). This is because SIGQUIT should core-dump by definition.

    • If it was just an error, print the error and return an error code.

Note: The code must be built with go build and run as a binary. It cannot be run with go run, as the go binary that forks our program will intercept the signal before our program can.

Multiple types of core dumps can be created in Go, controlled by an environmental variable. This is controlled by GOTRACEBACK. You can read about it here.

Cancellation with Cobra#

When Cobra was initially created, the context package did not exist. In 2020, the program was patched to allow the passing of a Context object into cobra.Command. But unfortunately, the Cobra generator was not updated to generate the necessary boilerplate.

To add signal handling as we did previously, we simply need to make a couple of modifications – first, to the main.go file:

Modified main function

We'll also need to modify handleSignal():

Modified handleSignal function

Finally, we must change the cmd/root.go as well. The complete code is shown below:

/
main.go
Signal handling in Cobra

Type go run main.go cmd2 in the terminal above. Then enter “Control+C” to send a signal to stop the execution.

We now have signal handling. When writing our Run function, we can use cmd.Context() to retrieve the Context object and look for cancelation.

One of the early Google systems to help automate the network was a system called Chipmunk. Chipmunk contained authoritative data on the network and would generate router configurations from that data. Like most software, Chipmunk started off working fast and saving a lot of time. As the network continued its yearly tenfold growth, the limits of its design and language choice began to show. Chipmunk was built on Django and Python and was not designed for horizontal scaling. As the system became busy, configuration requests would start to take 30 minutes or longer. Timers for these requests would have limits of no more than 30 minutes.

The design had a fatal flaw when generation approached these limits – if a request was canceled, the cancellation was not signaled to the running configuration generator.

This meant that if generation took 25 minutes but was canceled 1 minute in, the generator would spend the next 24 minutes working, with no one to receive the work. When a call reached the time limit, the callers would time out and retry. But the generator was still working on the previous call. This would lead to a cascade failure, as multiple compute-heavy calculations were running, some of which no longer had a receiver. This would push the new call over the time limit, as the Python Global Interpreter Lock prevents true multi-threading and each call was doubling CPU usage.

One of the keys to dealing with this type of failure scenario is being able to cancel jobs that are no longer needed. This is why it is so important to pipe a context.Context object throughout a function call chain and look for cancellation at logical points. This can greatly reduce the load on a system that reaches a threshold and reduce the damage of Distributed Denial of Service (DDoS) attacks.

This lesson has looked at how a program can intercept OS signals and respond to those signals. It has provided an example of using Context to handle canceling executions that can be used in any application. We have discussed how we can integrate that into programs generated with the Cobra generator.

Using Cobra for Advanced CLI Applications

Summary and Quiz on Writing Command-Line Tooling